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Abstract — In social networks, information and influence 
diffuse among users as cascades. While the importance of 
studying cascades has been recognized in various applications, 
it is difficult to observe the complete structure of cascades 
in practice. Moreover, much less is known on how to infer 
cascades based on partial observations. In this paper we 
study the cascade inference problem following the independent 
cascade model, and provide a full treatment from complexity 
to algorithms: (a) We propose the idea of consistent trees 
as the inferred structures for cascades; these trees connect 
source nodes and observed nodes with paths satisfying the 
constraints from the observed temporal information, (b) We 
introduce metrics to measure the likelihood of consistent trees 
as inferred cascades, as well as several optimization problems 
for finding them, (c) We show that the decision problems 
for consistent trees are in general NP-complete, and that the 
optimization problems are hard to approximate, (d) We provide 
approximation algorithms with performance guarantees on the 
quality of the inferred cascades, as well as heuristics. We 
experimentally verify the efficiency and effectiveness of our 
inference algorithms, using real and synthetic data. 

-information diffusion; cascade inference 

I. Introduction 

In various real-life networks, users frequently exchange 
information and influence each other. The information (e.g., 
messages, articles, recommendation links) is typically cre- 
ated from a user and spreads via links among users, leaving a 
trace of its propagation. Such traces are typically represented 
as trees, namely, information cascades, where (a) each node 
in a cascade is associated with the time step at which it 
receives the information, and (b) an edge from a node to 
another indicates that a user propagates the information to 
and influences its neighbor [4|, [12|. 

A comprehensive understanding and analysis of cascades 
benefit various emerging applications in social networks ||6l , 
L16.L viral marketing |[T1, ||9l, lITTl , and recommendation 
networks ll24l . In order to model the propagation of infor- 
mation, various cascade models have been developed ||8l , 
ED, ll33]| . Among the most widely used models is the 
independent cascade model |[T6l , where each node has only 
one chance to influence its inactive neighbors, and each node 
is influenced by at most one of its neighbors independently. 
Nevertheless, it is typically difficult to observe the entire 
cascade in practice, due to the noisy graphs with missing 
data, or data privacy policies 1211 . [|29 J . It is important to 
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Figure 1: A cascade of an Ad (partially observed) in a 
social network G from user Ann, and its two possible tree 
representations Ti and T2. 

develop techniques that can infer the cascades using partial 
information. Consider the following example. 

Example 1: The graph G in Fig. [T] depicts a fraction of a 
social network (e.g., Twitter), where each node is a user, and 
each edge represents an information exchange. For example, 
edge (Ann, Bill) with a weight 0.7 represents that a user Ann 
sends an advertisement (Ad) about a released product {e.g., 
"Iphone 4s") with probability 0.7. To identify the impact of 
an Ad strategy, a company would like to know the complete 
cascade starting from their agent Ann. Due to data privacy 
policies, the observed information may be limited: (a) at 
time step 0, Ann posts an Ad about "Iphone 4s"; (b) at time 
step 1, Bill is influenced by Ann and retweets the Ad; (c) 
by time step 3, the Ad reaches Mary, and Mary retweets 
it. As seen, the information diffuses from one user to his 
or her neighbors with different probabilities, represented by 
the weighted edges in G. Note that the cascade unfolds as 
a tree, rooted at the node Ann. 

To capture the entire topological information of the cas- 
cades, we need to make inferences in the graph-time domain. 
Given the above partially observed information, two such 
inferred cascades are shown as trees Ti and T2 in Fig. [T] Ti 
illustrates a cascade where each path from the source Ann to 
each observed node has a length that exactly equals to the 
time step, at which the observed node is influenced, while 
T2 illustrates a cascade where any path in T2 from Ann to an 
observed node has a length no greater than the observed time 
step when the node is influenced, due to possible delay in 
observation, e.g., Mary is known to be influenced by (instead 
of exactly at) time step 3. The inferred cascades provide 



useful information about the missing Hnks and users that 
are important in the propagation of the information. 

The above example highlights the need to make reason- 
able inference about the cascades, according to only the 
partial observations of influenced nodes and the time at 
or by which they are influenced. Although cascade models 
and a set of related problems, e.g., influence maximization, 
have been widely studied, much less is known on how to 
infer the cascade structures, including complexity bounds 
and approximation algorithms. 

Contributions. We investigate the cascade inference prob- 
lem, where cascades follow the widely used independent 
cascade model. To the best of our knowledge, this is 
the first work towards inferring cascades as general trees 
following independent cascade model, based on the partial 
observations. 

(a) We introduce the notions of (perfect and bounded) con- 
sistent trees in Section [III These notions capture the inferred 
cascades by incorporating connectivity and time constraints 
in the partial observations. To provide a quantitative measure 
of the quality of inferred cascades, we also introduce two 
metrics in Section III based on (i) the size of the consistent 
trees, and (ii) the likelihood when a diffusion function of 
the network graph is taken into account, respectively. These 
metrics give rise to two optimization problems, referred to as 
the minimum consistent tree problem and minimum weighted 
consistent tree problem. 

(b) We investigate the problems of identifying perfect and 
bounded consistent trees, for given partial observations, in 
Section |llll and Section [Ivl respectively. These problems are 
variants of the inference problem. 

(i) We show that these problems are all NP-complete. Worse 
still, the optimization problems are hard to approximate: 
unless P = NP, it is not possible to approximate the problems 
within any constant ratio. 

(ii) Nevertheless, we provide approximation and heuristic 
algorithms for these problems. For bounded trees, the prob- 
lems are 0(|X| * |^|^^^^)-approximable, where |X| is the 
size of the partial observation, and frain (resp. fmax) are the 
minimum (resp. maximum) probability on the graph edges. 
We provide such polynomial approximation algorithms. For 
perfect trees, we show that it is already NP-hard to even 
find a feasible solution. However, we provide an efficient 
heuristics using a greedy strategy. Finally, we address a 
practical special case for perfect tree problems, which are 
0{d^ j"^'"" )-approximable, where d is the diameter of the 
graph, which is typically small in practice. 

(c) We experimentally verify the effectiveness and the effi- 
ciency of our algorithms in Section |Vl using real-life data 
and synthetic data. We show that our inference algorithms 
can efficiently infer cascades with satisfactory accuracy. 



Related work. We categorize related work as follows. 

Cascade Models. To capture the behavior of cascades, a 
variety of cascade models have been proposed O, (TS), ifTSl , 
ifTTll , ifTSl , such as Suscepctible/Infected (SI) model ||2l, de- 
creasing cascade model ifTTl , triggering model |[T6l , Shortest 
Path Model 1191 , and the Susceptible/Infected/Recover (SIR) 
model ifTSl . In this paper, we assume that the cascades follow 
the independent cascade model |[T3l . which is one of the 
most widely studied models (the shortest path model |[T9l is 
one of its special cases). 

Cascade Prediction. There has been recent work on cas- 
cade prediction and inference, with the emphasis on global 
properties {e.g., cascade nodes, width, size) lEl, ifTTIl . ll20l . 
E3, lEl, EB, El with the assumption of missing data 
and partial observations. The problem of identifying and 
ranking influenced nodes is addressed in ll20l , (231, but 
the topological inference of the cascades is not considered. 
Wang et al. ll33l proposed a diffusive logistic model to 
capture the evolution of the density of active users at a 
given distance over time, and demonstrated the prediction 
ability of this model. Nevertheless, the structural informa- 
tion about the cascade is not addressed. Song et al. (311 
studied the probability of a user being influenced by a given 
source. In contrast, we consider a more general inference 
problem where there are multiple observed users, who are 
influenced at different time steps from the source. Fei et 
al. [11 J studied social behavior prediction and the effect of 
information content. In particular, their goal is to predict 
actions on an article based on the training dataset. Budak et 
al. |5| investigated the optimization problem of minimizing 
the number of the possible influencing nodes following 
a specified cascade model, instead of predicting cascades 
based on partial observations. 

All the above works focus on predicting the nodes and 
their behavior in the cascades. In contrast, we propose 
approaches to infer both the nodes and the topology of the 
cascades in the graph-time domain. 

Network Inference. Another host of work study network 
inference problem, which focuses on inferring network 
structures from observed cascades over the unknown net- 
work, instead of inferring cascade structures as trees ifTOll , 
|[T4l . Manuel et al. |14| proposes techniques to infer the 
structure of a network where the cascades flow, based on 
the observation over the time each node is affected by a 
cascade. Similar network inference problem is addressed 
in jlOl , where the cascades are modeled as (Markov random 
walk) networks. The main difference between our work and 
theirs is (a) we use consistent trees to describe possible 
cascades allowing partial observations; (b) we focus on 
inferring the structure of cascades as trees instead of the 
backbone networks. 

Closer to our work is the work by Sadikov et al. 1291 that 
consider the prediction of the cascades modeled as /c-trees, a 
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Figure 2: Tree representations of a partial observation X = 
{(Ann,0), (Bill, 1), (Mary, 3)}: T3, T4 and T5 are consistent 
Trees, while Tq is not. 

balanced tree model. The global properties of cascades such 
as size and depth are predicted based on the incomplete 
cascade. In contrast to their work, (a) we model cascades as 
general trees instead of /c-balanced trees, (b) while Sadikov 
et al. II29I assume the partial cascade is also a /c-tree and 
predict only the properties of the original cascade, we infer 
the nodes as well as topology of the cascades only from a set 
of nodes and their activation time, using much less available 
information, (c) The temporal information (e.g., time steps) 
in the partial observations is not considered in ll29l . 

II. Consistent Trees 
We start by introducing several notions. 

Diffusion graph. We denote a social network as a directed 
graph G = (F, where (a) F is a finite set of nodes, 

and each node u ^ V denotes a user; (b) £^ C ]/ x is 
a finite set of edges, where each edge {u^v) G E denotes 
a social connection via which the information may diffuse 
from uio v\ and (c) a diffusion function f : E ^ which 
assigns for each edge (ix, v) ^ E di value f{u^ v) G [0, 1], as 
the probability that node u influences v. 

Cascades. We first review the independent cascade 
model lfT6l . We say an information propagates over a graph 
G following the independent cascade model if (a) at any 
time step, each node in G is exactly one of the three states 
{active, newly active, inactive}', (b) a cascade starts from 
a source node s being newly active at time step 0; (c) a 
newly active node u at time step t has only one chance to 
influence its inactive neighbors, such that at time t + 1, (i) 
if V is an inactive neighbor of u, v becomes newly active 
with probability f{u^ v); and (ii) the state of u changes from 
newly active to active, and cannot influence any neighbors 
afterwards; and (d) each inactive node v can be influenced 
by at most one of its newly active neighbors independently, 
and the neighbors' attempts are sequenced in an arbitrary 
order. Once a node is active, it cannot change its state. 

Based on the independent cascade model, we define a 
cascade C over graph G = {V^E^ f) as a directed tree 
(Vc, Ec, s, T) where (a) Vc CV, EcC E; (b) seVc'i^ the 
source node from which the information starts to propagate; 
and (c) T is a function which assigns for each node Vi G Vc 



a time step U, which represents that Vi is newly active at 
time step U. Intuitively, a cascade is a tree representation of 
the "trace" of the information propagation from a specified 
source node s to a set of influenced nodes. 

Indeed, one may verify that any cascade from s following 
the independent cascade model is a tree rooted at s. 

Example 2: The graph G in Fig. [T] depicts a social graph. 
The tree Ti and T2 are two possible cascades following the 
independent cascade model. For instance, after issuing an ad 
of "Iphone 4s", Ann at time becomes "newly active". Bill 
and Jack retweet the ad at time 1. Ann becomes "active", 
while Bill and Jack are turned to "newly active". The process 
repeats until the ad reaches Mary at time step 3. The trace 
of the information propagation forms the cascade Ti . 

As remarked earlier, it is often difficult to observe the 
entire structure of a cascade in practice. We model the 
observed information for a cascade as a partial observation. 

Partial observation. Given a cascade C = {Vc,Ec,s,T), 
a pair {vi^U) is an observation point, if Vi e V is known 
(observed) to be newly active at or by time step U . A partial 
observation X is a set of observation points. Specifically, 
X is a complete observation if for any v G Vc, there is an 
observation point (v^t) G X. To simplify the discussion, 
we also assume that pair (s, 0) G X where s is the source 
node. The techniques developed in this paper can be easily 
adapted to the case where the source node is unknown. 
We are now ready to introduce the idea of consistent trees. 

A. Consistent trees 

Given a partial observation X of a graph G = {V^E^ f), 
a bounded consistent tree Tg = {Vt^^ Et^^s) w.r.t. X is a 
directed subtree of G with root 5 G V, such that for every 
(vi^ti) e X,Vi e Vt^, and s reaches Vi by ti hops, i.e., there 
exists a path of length at most U from s to Vi. Specifically, 
we say a consistent tree is a perfect consistent tree if for 
every (vi^ti) G X and Vi G Vt^, there is a path of length 
equals to ti from s to Vi. 

Intuitively, consistent trees represent possible cascades 
which conform to the independent cascade model, as well 
as the partial observation. Note the following: (a) the path 
from the root s to a node Vi in a bounded consistent tree 
Ts is not necessarily a shortest path from s to Vi in G, 
as observed in ll22l : (b) the perfect consistent trees model 
cascades when the partial observation is accurate, i.e., each 
time ti in an observation point (vi^ti) is exactly the time 
when Vi is newly active; in contrast, in bounded consistent 
trees, an observation point (v^t) indicates that node v is 
newly active at the time step t' < t, due to possible delays 
in the information propagation, as observed in ||6l . 

Example 3: Recall the graph G in Fig. \T\ The partial 
observation of a cascade in G is X = {(Ann,0), (Bill, 1), 
(Mary, 3)}. The tree Ti is a perfect consistent tree w.rt. X, 
where T2 is a bounded consistent tree w.rt. X. 



Now consider the trees in Fig. O One may verify that (a) 
T3, T4 and T5 are bounded consistent trees w.r.t. X\ (b) T3 
and T4 are perfect consistent trees w.r.t. X, where T5 is not 
a perfect consistent tree, (c) Tq is not a consistent tree, as 
there is no path from the source Ann to Mary with length 
no greater than 3 as constrained by the observation point 
(Mary, 3). 

B. Cascade inference problem 

We introduce the general cascade inference problem. 
Given a social graph G and a partial observation X, the 
cascade inference problem is to determine whether there 
exists a consistent tree T w.r.t. X m G. 

There may be multiple consistent trees for a partial ob- 
servation, so one often wants to identify the best consistent 
tree. We next provide two quantitative metrics to measure 
the quality of the inferred cascades. Let G = (F, /) be a 
social graph, and X be a partial observation. 

Minimum weighted consistent trees. In practice, one often 
wants to identify the consistent trees that are most likely to 
be the real cascades. Recall that each edge {u^v) G £^ in a 
given network G carries a value assigned by a diffusion 
function f{u,v), which indicates the probability that u 
influences v. Based on f{u^v), we introduce a likelihood 
function as a quantitative metric for consistent trees. 

Likelihood function. Given a graph G = (F, /), a partial 
observation X and a consistent tree T5 = (Vr^ , Et^ ,5), the 
likelihood of T5, denoted as Lx{Ts), is defined as: 

Following common practice, we opt to use the log- 
likelihood metric, where 

{u,v)eETs 

Given G and X, a natural problem is to find the consistent 
tree of the maximum likelihood in G w.r.t. X. Using log- 
likelihood, the minimum weighted consistent tree problem 
is to identify the consistent tree Tg with the minimum 
—Lx(Ts), which in turn has the maximum likelihood. 

Minimum consistent trees. Instead of weighted consistent 
trees, one may simply want to find the minimum structure 
that represents a cascade 1251 . The minimum consistent tree, 
as a special case of the minimum weighted consistent tree, 
depicts the smallest cascades with the fewest communication 
steps to pass the information to all the observed nodes. In 
other words, the metric favors those consistent trees consist 
with the given partial observation with the fewest edges. 

Given G and X, the minimum consistent tree problem is 
to find the minimum consistent trees in G w.r.t. X. 

In the following sections, we investigate the cascade 
inference problem, and the related optimization problems 



using the two metrics. We investigate the problems for 
perfect consistent trees in Section Hill and for bounded 
consistent trees in Section [iVl respectively. 

III. Cascades as perfect trees 

As remarked earlier, when the partial observation X is 
accurate, one may want to infer the cascade structure via 
perfect consistent trees. The minimum (resp. weighted) 
perfect consistent tree problem, denoted as PCTmin (resp. 
PCTw) is to find the perfect consistent trees with minimum 
size (resp. weight) as the quality metric. 

Though it is desirable to have efficient polynomial time 
algorithms to identify perfect consistent trees, the problems 
of searching PCTmin and PCTw are nontrivial. 

Proposition 1: Given a graph G and a partial observation 
X, (a) it is NP-complete to determine whether there is a 
perfect consistent tree w.r.t. X in G\ and (b) the PCTmin 
and PCTw problems are NP-complete and APX-hard. 

One may verify Proposition [Ha) by a reduction from 
the Hamiltonian path problem 1321 . which is to determine 
whether there is a simple path of length |]/| — 1 in a graph G 
=(y, E). Following this, one can verify that the PCTmin and 
PCTw problems are NP-complete as an immediate result. 

Proposition [Hb) shows that the PCTmin and PCTw prob- 
lems are hard to approximate. The APX class 1321 consists 
of NP optimization problems that can be approximated by 
a polynomial time (PTIME) algorithm within some positive 
constant. The APX-hard problems are APX problems to 
which every APX problem can be reduced. Hence, the prob- 
lem for computing a minimum (weighted) perfect consistent 
tree is among the hardest ones that allow PTIME algorithms 
with a constant approximation ratio. 

It is known that if there is an approximation preserving 
reduction (AFP- reduction) l32l from a problem 11 1 to a 
problem 112, and if problem Hi is APX-hard, then 112 is 
APX-hard l32l . To see Proposition [TJb), we may construct 
an AFP- reduction from the minimum directed steiner tree 
(MST) problem. An instance of a directed steiner tree 
problem / = {G, K-, Vs^r^ w} consists of a graph G, a set of 
required nodes Vr, a set of steiner nodes Vg, a source node 
r and a function w which assigns to each node a positive 
weight. The problem is to find a minimum weighted tree 
rooted at r, such that it contains all the nodes in Vr and a 
part of Vq. We show such a reduction exists. Since MST is 
APX-hard, PCTmin is APX-hard. 

A. Bottom-up searching algorithm 

Given the above intractability and approximation hardness 
result, we introduce a heuristic WPCT for the PCTw 
problem. The idea is to (a) generate a "backbone network" 
Gh of G which contains all the nodes and edges that are 
possible to form a perfect consistent tree, using a set of 
pruning rules, and also rank the observed nodes in G^ with 



Input: graph G and partial observation X. 
Output: a perfect consistent tree T in G. 

1. tree T = {Vt,Et), where Vt := {v\{v,t) G X}, 
set level t for each {v, t) e X, E := 0; 

2. set Vb := {vb|dist(s, Vb) < tmax}; 

3. if there is a node ^; in X and v ^ Vb then return 0; 

4. set £;b := e E,v' e Vt,v G 14)}; 

5. for each ^; G Vb do 

6. if there is no (vi^ti) G X that 
dist(s,'?;)+dist(i;, Vi) < ti then 

7. Vb = 14 \ {^^}; 

8. Eb = Eb\ {{vi,V2)} where = ^; or = 'i^; 

9. graph Gb := (14, ^^b); 

10. list L := {{vi,ti),...,{vk,tk)} 

where ti < t^+i, (^^i, t^) G X, i G [1, /c - 1]; 

11. for each i G [l,tmacc] following descending order do 

12. Vt- ViUVsU^s, V^i := {v^\{v,U) G X}; 
^2 {v\veVTj{v) = Uy, 

V3 := W\{v',v) eEb,ve ViUV2,v' ^ Vt}; 

13. Et := {(v\v)\v' eV3,ve ViUV2,{v,v) G Eb}; 

14. construct Gt = (Vt^Et); 

15. T :=TU PCT|(G',, Vi U ^2, 14, i); 

16. if T is a tree then return T; 

17. return 0; 

Procedure PCT| 

Input: A bipartite graph Gt, 

node set V, node set 14, a number t^; 
Output: a forest Tt. 

1. Tt = 0; 

2. construct Tt as a minimum weighted steiner forest 
which cover V as the required nodes; 

3. for each tree Ti G Tt do 

4. /(r) := ti — 1 where r G Vs is the root of Ti; 

5. return Tt; 



Figure 3: Algorithm WPCT: initialization, pruning and local 
searching 

the descending order of their time step in X, and (b) perform 
a bottom-up evaluation for each time step in using a 
local-optimal strategy, following the descending order of the 
time step. 

Backbone network. We consider pruning strategies to re- 
duce the nodes and the edges that are not possible to be in 
any perfect consistent trees, given a graph G = (V, /) and 
a partial observation X = {(^i, ti ),..., (v^, ^fe)}- We define 
a backbone network G5 = (I4, ^5), where 

• Vb = U{'^jl^'St(^7 '^j) di\s\{vj,Vi) < ti} for each 

(vi^ti) G X; and 
. Et = {{v',v)\v' eVb,veVb,{v',v)eE} 

Intuitively, Gb includes all the possible nodes and edges 
that may appear in a perfect consistent tree for a given 
partial observation. In order to construct G5, a set of 
pruning rules can be developed as follows: if for a node 
v' and each observed node v in a cascade with time step 
t, 6\s{{s,v') + 6\s\{v\v) > t, then and all the edges 
connected to can be removed from G5. 

Algorithm. Algorithm WPCT, as shown in Fig. [3l consists 



of the following steps: 

Initialization (line 1). The algorithm WPCT starts by initial- 
izing a tree T, by inserting all the observation points into 
T. Each node v in T is assigned with a level l{v) equal to 
its time step as in X. The edge set is set to empty. 

Pruning (lines 2-10). The algorithm WPCT then constructs 
a backbone network Gb with the pruning rules (lines 2-9). It 
initializes a node set 14 within tmax hop of the source node 
s, where tmax is the maximum time step in X (line 2). If 
there exists some node v e X that is not in 14, the algorithm 
returns 0, since there is no path from s reaching v with t 
steps for (v, t) e X (line 3). It further removes the redundant 
nodes and edges that are not in any perfect trees, using the 
pruning rules (lines 5-8). The network Gb is then constructed 
with 14 and Eb at line 9. The partial observation X is also 
sorted w.rt. the time step (line 10). 

Bottom-up local searching (lines 11-17). Following a 
bottom-up greedy strategy, the algorithm WPCT processes 
each observation point as follows. For each i in [1, tmax], it 
generates a (bipartite) graph Gt. (a) It initializes a node set 
Vt as the union of three sets of nodes Vi, 14 and 14 (line 12), 
where (i) Vi is the nodes in the observation points with time 
step ti, (ii) 14 is the nodes v in the current perfect consistent 
tree T with level l{v) = ti, and (iii) 14 is the union of the 
parents for the nodes in Vi and 14 • (b) It constructs an edge 
set Et which consists of the edges from the nodes in 14 to 
the nodes in Vi and 14. (c) It then generates Gt with Vt 
and the edge set Et, which is a bipartite graph. After Gt is 
constructed, the algorithm WPCT invokes procedure PCT| 
to compute a "part" of the perfect tree T, which is an optimal 
solution for Gt, a part of the graph Gb which contains all 
the observed nodes with time step ti. It expands T with the 
returned partial tree (line 15). The above process (lines 11- 
15) repeats for each i G [1, tmax] until all the nodes in X are 
processed. Algorithm WPCT then checks if the constructed 
T is a tree. If so, it returns T (line 16). Otherwise, it returns 
(line 17). The above procedure is as illustrated in Fig. (H 

Procedure PCJ\. Given a (bipartite) graph Gt, and two sets 
of nodes V and I4 in Gt, the procedure PCT| computes for 
Gt a set of trees Tt = {Ti, . . . , T^} with the minimum total 
weight (line 2), such that (a) each Ti is a 2 -level tree with 
a root in Vs and leaves in V, (b) the leaves of any two trees 
in Tt are disjoint, and (c) the trees contain all the nodes in 
V as leaves. For each Ti, PCJ\ assigns its root r in I4 a 
level l{r) = ti — 1 (line 4). Tt is then returned as a part 
of the entire perfect consistent tree (line 5). In practice, we 
may either employ linear programming, or an algorithm for 
MST problem (e.g., EH) to compute Tt. 

Example 4: The cascade Ti in Fig. [T] as a minimum 
weighted perfect consistent tree, can be inferred by algo- 
rithm WPCT as illustrated in Fig. IH WPCT first initializes 
a tree T with the node Mary. It then constructs Gt as 




Figure 4: The bottom-up searching in the backbone network 

the graph induced by edges (Tom, Mary), (Jack, Mary), and 
(Mike, Mary). Intuitively, the three nodes as the parents of 
Mary are the possible nodes which accepts the message 
at time step 2. It then selects the tree with the maximum 
probability, which is a single edge (Mike, Mary), and adds 
it to T. Following Mike, it keeps choosing the optimal 
tree structure for each level, and identifies nodes Jack. The 
process repeats until WPCT reaches the source Ann. It then 
returns the perfect consistent tree T as the inferred cascade 
from the partial observation X. 

Correctness. The algorithm WPCT either returns 0, or 
correctly computes a perfect consistent tree w.r.t. the partial 
observation X. Indeed, one may verify that (a) the pruning 
rules only remove the nodes and edges that are not in any 
perfect consistent tree w.r.t. X, and (b) WPCT has the loop 
invariant that at each iteration i (lines 11-15), it always 
constructs a part of a perfect tree as a forest. 

Complexity. The algorithm WPCT is in time 0(|1/||£;| + 
l-^P + tmax * A), where tmax IS the maximum time step in 
X, and A is the time complexity of procedure PCT|. Indeed, 
(a) the initialization and preprocessing phase (lines 1-9) 
takes 0(|1/||£;|) time, (b) the sorting phase is in 0(|Xp) 
time, (c) the bottom-up construction is in 0{\tmax * -4.|), 
which is further bounded by 0{\tmax * I^P) if an approx- 
imable algorithm is used 1281 . In our experimental study, we 
utilize efficient linear programming to compute the optimal 
steiner forest. 

The algorithm WPCT can easily be adapted to the prob- 
lem of finding the minimum perfect consistent trees, where 
each edge has a unit weight. 

Perfect consistent SP trees. The independent cascade model 
may be an overkill for real-life applications, as observed 
in (71, |[T9l . Instead, one may identify the consistent trees 
which follow the shortest path model |[T9l , where cascades 
propagate following the shortest paths. We define a perfect 
shortest path (sp) tree rooted at a given source node s as a 
perfect consistent tree, such that for each observation point 
{v^t) G X of the tree, t = dist(5,v); in other words, the 
path from 5 to v in the tree is the shortest path in G. The 
PCTw (resp. PCTmin) problem for sp trees is to identify the 



sp trees with the maximum likelihood (resp. minimum size). 

Proposition 2: Given a graph G and a partial observation 
X, (a) it is in PTIME to find a sp tree w.rt. X\ (b) the PCTmin 
and PCTw problems for perfect sp trees are NP-hard and 
APX-hard; (c) the PCTw problem is approximable within 
0{d^ i2|i22iiiL)^ where d is the diameter of G, and jmax 
(resp. fmin) is the maximum (resp. minimum) probability 
by the diffusion function /. 

We next provide an approximation algorithm to the PCTw 
problem for sp trees. Given a graph G and a partial 
observation X, the algorithm, denoted as WPCTgp (not 
shown), first constructs the backbone graph G^ as in the al- 
gorithm WPCT. It then constructs node sets Vr = {v\{vj t) G 
X}, and Vs = V\Vr. Treating Vr as required nodes, Vs as 
steiner nodes, and the log-likelihood function as the weight 
function, WPCTgp approximately computes an undirected 
minimum steiner tree T. If the directed counterpart T' of T 
in Gb is not a tree, WPCTgp transforms T' to a tree: for 
each node v in with more than one parent, it (a) connects 
s and V via the shortest path, and (b) removes the redundant 
edges attached to v. It then returns as an sp tree. 

One may verify that (a) T' is a perfect sp tree w.rt. X, (b) 
the weight -Lx{t') is bounded by 0(d* ^j^) times of 
the optimal weight, and (c) the algorithm runs in 0(|V^|) 
time, leveraging the approximation algorithm for the steiner 
tree problem ['32]. Moreover, the algorithm WPCTgp can be 
used for the problem PCTmin for sp trees, where each edge 
in G has the same weight. This achieves an approximation 
ratio of d. 

IV. Cascades as bounded trees 

In this section, we investigate the cascade inference 
problems for bounded consistent trees. In contrast to the 
intractable counterpart in Proposition [Ha), the problem of 
finding a bounded consistent tree for a given graph and a 
partial observation is in PTIME. 

Proposition 3: For a given graph G and a partial obser- 
vation X, there is a bounded consistent tree in G w.rt. X 
if and only if for each (v^t) G X, dist(s,v) < t, where 
dist(s, v) is the distance from s to v in G. 

Indeed, one may verify the following: (a) if there is 
a node (vi^U) G X where dist(s,Vi) > U, there is no 
path satisfies the time constraint and T is empty; (b) if 
dist(s,Vi) < ti for each node (vi^ti) G X, a BPS tree 
rooted at s with each node Vi in X as its internal node or 
leaf is a bounded consistent tree. Thus, to determine whether 
there is a bounded consistent tree is in 0{\E\) time, via a 
BPS traversal of G from s. 

Given a graph G and a partial observation X, the mini- 
mum weighted bounded consistent tree problem, denoted as 
BCTw, is to identify the bounded consistent tree T* w.rt. 
X with the minimum — log Lx{T^) (see Section |II]). 



Input: graph G and partial observation X. 
Output: a bounded consistent tree T in G. 

1. tree T = {Vt, Et). where Vt := {s\{s, 0) G X}, := 0; 

2. compute tfc bounded BFS DAG Gd of s in G; 

3. for each U G [ti,tfc] do 

4. for each node v where (v^ti) ^ X and l(v) = i do 

5. if i > then return 0; 

6. find a path p from s to ^; with the 

minimum weight w{p) = —Slog /(e) for each e G p; 

7. T = T U p; 

8. return T as a bounded consistent tree; 



Figure 5: Algorithm WBCT: searching bounded consistent 
trees via top-down strategy 

Theorem 1: Given a graph G and a partial observation 
X, the BCTw problem is 

(a) NP-complete and APX-hard; and 

(b) approximable within 0(\X\ * y^f^^), where fmax 
(resp. fmin) is the maximum (resp. minimum) prob- 
ability by the diffusion function / over G. 

We can prove Theorem [TJa) as follows. First, the BCTw 
problem, as a decision problem, is to determine whether 
there exists a bounded consistent tree T with —Lx{T) no 
greater than a given bound B. The problem is obviously in 
NP. To show the lower bound, one may show there exists a 
polynomial time reduction from the exact 3 -cover problem 
(X3C). Second, to see the approximation hardness, one may 
verify that there exists an AFP- reduction from the minimum 
directed steiner tree (MST) problem. 

We next provide a polynomial time algorithm, denoted as 
WBCT, for the BCTw problem. The algorithm runs in linear 
time w.r.t. the size of G, and with performance guarantee as 
in Theorem [TJb). 

Algorithm. The algorithm WBCT is illustrated in Fig. [5l 
Given a graph G and a partial observation X, the algorithm 
first initializes a tree T = {VtjEt) with the single source 
node s (line 1). It then computes the tk bounded BFS 
directed acyclic graph (DAG ) IS G^^ of the source node s, 
where tk is the maximum time step of the observation points 
in X, and Gd is a DAG induced by the nodes and edges 
visited by a BFS traversal of G from s (line 2). Following 
a top-down strategy, for each node v of (v, t) G X, WBCT 
then (a) selects a path p with the minimum E log /(e) from 
s to V, and (b) extends the current tree T with the path 
p (lines 3-7). If for some observation point (v^t) G T, 
dist(s, v) > t, then WBCT returns as the tree T (line 5). 
Otherwise, the tree T is returned (line 8) after all the 
observation points in X are processed. 

Correctness and complexity. One may verify that algo- 
rithm WBCT either correctly computes a bounded consistent 
tree T, or returns 0. For each node in the observation point 
X, there is a path of weight selected using a greedy strategy, 
and the top-down strategy guarantees that the paths form a 
consistent tree. The algorithm runs in time 0{\E\), since it 



visits each edges at most once following a BFS traversal. 

We next show the approximation ratio in Theorem Wlb). 
Observe that for a single node v in X, (a) the total weight 
of the path w from 5 to v is no greater than —\w\ log fmin, 
where \w\ is the length of w, and (b) the weight of 
the counterpart of w in T*, denoted as w\ is no less 
than —Iw'^l log fmax- Also observe that \w\ < Thus, 
w/w^ < ^Y^- As there are in total |X| such nodes, 
Lx{T)/Lxlf)\ \X\^ < \X\'^j^. Theorem mb) 
thus follows. 

Minimum bounded consistent tree. We have considered 
the likelihood function as a quantitative metric for the 
quality of the bounded consistent trees. As remarked earlier, 
one may simply want to identify the bounded consistent 
trees of the minimum size. Given a social graph G and 
a partial observation X, the minimum bounded consistent 
tree problem, denoted as BCTmin, is to identify the bounded 
consistent tree with the minimum size, i.e., the total number 
of nodes and edges. The BCTmin problem is a special case 
of BCTw, and its main result is summarized as follows. 

Proposition 4: The BCTmin problem is (a) NP-complete, 
(b) APX-hard, and (c) approximable within 0(|X|), where 
|X| is the size of the partial observation X. 

Proposition lUa) and|4tb) can both be shown by construct- 
ing reductions from the MST problem, which is NP-complete 
and APX-complete ll32l . 

Despite of the hardness, the problem can be approximated 
within 0(|X|) in polynomial time, by applying the algo- 
rithm WBCT over an instance where each edge has a unit 
weight. This completes the proof of Proposition Uc). 

V. Experiments 

We next present an experimental study of our proposed 
methods. Using both real-life and synthetic data, we conduct 
three sets of experiments to evaluate (a) the effectiveness of 
the proposed algorithms, (b) the efficiency and the scalability 
of WPCT and WBCT. 

Experimental setting. We used real-life data to evaluate the 
effectiveness of our methods, and synthetic data to conduct 
an in-depth analysis on scalability by varying the parameters 
of cascades and partial observations. 

(a) Real-life graphs and cascades. We used the following 
real-life datasets. (i) Enron email cascades. The dataset of 
Enron Emails^ consists of a social graph of 86, 808 nodes 
and 660, 642 edges, where a node is a user, and two nodes 
are connected if there is an email message between them. 
We tracked the forwarded messages of the same subjects and 
obtained 260 cascades of depth no less than 3 with more 
than 8 nodes, (ii) Re tweet cascades (RT). The dataset of 
Twitter Tweets El | 35 1 contains more than 470 million posts 

%ttp://www.cs. cmu.edu/ enron/ 

^http :// snap . Stanford. edu/data/twitter7 .html 
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Figure 6: The prec and rec of the inference algorithms over Enron email cascades and Retweet cascades 



from more than 17 million users, covering a period of 7 
months from June 2009. We extracted the retweet cascades 
of the identified hashtags 1351 . To guarantee that a cascade 
represents the propagation of a single hashtag, we removed 
those retweet cascades containing multiple hashtags. In the 
end, we obtain 321 cascades of depth more than 4, with 
node size ranging from 10 to 81. Moreover, we used the 
EM algorithm from ll30l to estimate the diffusion function. 

(h) Synthetic cascades. We generated a set of synthetic cas- 
cades unfolding in an anonymous Facebook social graph H, 
which exhibits properties such as power-law degree distri- 
bution, high clustering coefficient and positive assortativ- 
ity ll34l . The diffusion function is constructed by randomly 
assigning real numbers between and 1 to edges in the 
network. The generating process is controlled by size |T|. 
We randomly choose a node as the source of the cascade. By 
simulating the diffusion process following the independent 
cascade model, we then generated cascades w.r.t. \T\ and 
assigned time steps. 

(c) Partial observation. For both real life and synthetic 

cascades, we define uncertainty of a cascade T as a = 

I X I 

1-^, where \Vt\ is the size of the nodes in T, and |X| is 
the size of the partial observation X. We remove the nodes 
from the given cascades until the uncertainty is satisfied, and 
collect the remaining nodes and their time steps as X. 

(d) Implementation. We have implemented the following in 
C++: (i) algorithms WPCT, and WBCT; (ii) two linear 
programming algorithms PCTip and BCTip, which iden- 
tify the optimal weighted bounded consistent trees and 
the optimal perfect consistent trees using linear program- 
ming, respectively; (iii) two randomized algorithms PCTr 
and BCTr, which are developed to randomly choose trees 
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Precision 
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Twitter 
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prec^ 


100% 


100% 


97.2% 


93.2% 


prec^ 


78.2% 


82.4% 


86.1% 


82.6% 


WBCT 




100% 


70.1% 


73.6% 


66.1% 


prec^ 


69% 


55.7% 


60.6% 


41.7% 



Table I. prec^ and preCg over real cascades 

from given graphs. PCTr is developed using a similar 
strategy for WPCT, especially for each level the steiner 
forest is randomly selected (see Section [III|); as WBCT 
does, BCTr runs on bounded BPS directed acyclic graphs, 
but randomly selects edges, (iv) to verify various imple- 
mentations of WPCT, an algorithm PCTg is developed by 
using a greedy strategy to choose the steiner forest for each 
level (see Section HID). We used LP_solve 5.50 as the linear 
programming solver. 

We used a machine powered by an Intel(R) Core 2.8GHz 
CPU and 8GB of RAM, using Ubuntu 10.10. Each experi- 
ment was run by 10 times and the average is reported here. 

Experimental results. We next present our findings. 
Effectiveness of consistent trees. In the first set of experi- 
ments, using real life cascades, we investigated the accuracy 
and the efficiency of our cascade inference algorithms. 

(a) Given a set of real life cascade T = {Ti, . . . , T/^}, 
for each cascade Ti = (VTi,£^Tj ^ T, we computed an 
inferred cascade T/ = (Vt/,£^t/) according to a partial 
observation with uncertainty a. Denote the nodes in the 

partial observation as Vx- We evaluated the precision as 

^{\{Vj...nVT,)\Vx\) _ ^{\{Vr.^nVT,)\Vx\) 

Intuitively, prec is the fraction of inferred nodes that are 
missing from Ti, while rec is the fraction of missing nodes 
that are inferred by T/. 



^http://current.cs.ucsb.edu/socialnets 



'^http://lpsolve.sourceforge.net/5.5/ 
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For Enron email cascades, Fig. |6(a)| and Fig. |6(b)| show the 
accuracy of WPCT, PCTg and PCTr for inferring cascades, 
while a is varied from 0.25 to 0.85. PCTip does not scale 
over the Enron dataset and thus is not shown, (i) WPCT 
outperforms PCTg and PCTr on both prec and rec. (ii) 
When the uncertainty increases, both the prec and rec of the 
three algorithms decrease. In particular, WPCT successfully 
infers cascade nodes with prec no less than 70% and rec 
no less than 25% even when 85% of the nodes in the 
cascades are removed. Using the same setting, the perfor- 
mance of WBCT, BCTip and BCTr are shown in Fig. |6(c)| 
and Fig. [6(d)l respectively, (i) Both BCTip and WBCT 
outperform BCTr, and their prec and rec decrease while 
the uncertainty increases, (ii) BCTip has better performance 
than WBCT. In particular, both BCTip and WBCT success- 
fully infer the cascade nodes with the prec no less than 50% 
and with the rec no less than 25%, even when 85% of the 
nodes in the cascades are removed. 

For retweet cascades, the prec and the rec 
of WPCT, PCTg and PCTr are shown in Fig. [6(e)] 
and in Fig. |6(f)[ respectively. While the uncertainty 
increases from 0.25 to 0.85, (i) WPCT outperform PCTr 
and PCTg, and (ii) the performance of all the algorithms 
decreases. In particular, WPCT successfully infers the 
nodes with the prec more than 80% and the rec more than 
35%, while the uncertainty is 25%. Similarly, the prec and 



the rec of WBCT and BCTr are presented in Fig. |6(g) 



and Fig. |6(h)[ respectively. As BCT|p does not scale on 
retweet cascades, its performance is not shown. While 
the uncertainty a increases, the prec and the rec of the 
algorithms decrease. For all a, WBCT outperforms BCTr; 
in particular, WBCT correctly infers the nodes with prec no 
less than 60% and rec no less than 25%, when a is 25%. 

(b) To further evaluate the structural similarity of Ti and 
T/ as described in (a), we also evaluate (i) prec^ = 
for nodes V = (Ft/ H FtJ \ V^, where G V are 
the nodes with the same topological order in both T/ and 
Ti, and (ii) prec^ = for E' = Et, HEt,', following 



the metric for measuring graph similarity II26II . The average 
results are as shown in Table H for a =50%, and the 
cascades of fixed depth. As shown in the table, for WPCT, 
the average prec^ is above 90%, and the average preCg is 



above 75% over both datasets. Better still, the results hold 
even when we set a = 85%. For WBCT, prec^ and preCg 
are above 65% and above 40%, respectively. For WPCT, 
prec^ and preCg have almost consistent performance on both 
datasets; however, for WBCT, the prec^ and prec^ of the 
inferred Enron cascades are higher than those of the inferred 
retweet cascades. The gap might result from the different 
diffusion patterns between these two datasets: we observed 
that there are more than 70% of cascades in the Enron 
dataset whose structures are contained in the BPS directed 
acyclic graphs of WBCT, while in the Twitter Tweets there 
are less than 45% of retweet cascades following the assumed 
graph structures of WBCT. 

Efficiency over real datasets. In all the tests over real 
datasets, PCTr, BCTr, PCTg and WBCT take less than 
1 second. BCTip does not scale for retweet cascades, while 
PCTip does not scale for both datasets. On the other hand, 
while WPCT takes less than 0.4 seconds in inferring all the 
Enron cascades, it takes less than 20 seconds to infer Twitter 
cascades where (i=4, and 100 seconds when d = b. Indeed, 
for Twitter network the average degree of the nodes is 20, 
while the average degree for Enron dataset is 7. As such, it 
takes more time for WPCT to infer Twitter cascades in the 
denser Twitter network. In our tests, the efficiency of all the 
algorithms are not sensitive w.rt. the changes to a. 

Efficiency and scalability over synthetic datasets. In the 
second set of experiments, we evaluated the efficiency and 
the scalability of our algorithms using synthetic cascades. 

(a) We first evaluate the efficiency and scalability of WPCT 
and compare WPCT with PCTr and PCTg. 

Fixing uncertainty a = 50%, we varied \T\ from 30 to 
240. Fig. |7(c)| shows that WPCT scales well with the size 
of the cascade. Indeed, it only takes 2 seconds to infer the 
cascades with 300 nodes. 

Fixing size |T| = 100, we varied the uncertainty a from 
0.25 to 0.85. Fig. |7(d)| illustrates that while all the three 
algorithms are more efficient with larger a, WPCT is more 
sensitive. All the three algorithms scale well with a. 

As PCTip does not scale well, its performance is not 
shown in Fig. |7(c)| and Fig. |7(d)[ 

(b) Using the same setting, we evaluated the performance 
of WBCT, compared with BCTip and BCTr. 
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Table II. Summary: complexity and approximability 

Fixing a and varying |T|, the result is reported in 
Fig. |7(a)[ First, WBCT outperforms BCTip, and is almost as 
efficient as the randomized algorithm BCTr. For the cascade 
of 240 nodes, WBCT takes less than 0.5 second to infer the 
structure, while BCTip takes nearly 1000 seconds. Second, 
while WBCT is not sensitive to the change of |T|, BCTip is 
much more sensitive. 

Fixing |T| and varying a. Fig. |7(b)| shows the performance 
of the three algorithms. The figure tells us that WBCT 
and BCTr are less sensitive to the change of a than BCT|p. 
This is because WBCT and BCTr identify bounded con- 
sistent tree by constructing shortest paths from the source 
to the observed nodes. When the maximum depth of the 
observation point is fixed, the total number of nodes and 
edges visited by WBCT and BCTr are not sensitive to a. 

Summary. We can summarize the results as follows, (a) 
Our inference algorithms can infer cascades effectively. 
For example, the original cascades and the ones inferred 
by WPCT have structural similarity (measured by preCg) of 
higher than 75% in both real-life datasets. (b) Our algorithms 
scale well with the sizes of the cascades, and uncertainty. 
They seldom demonstrated their worst-case complexity. For 
example, even for cascades with 240 nodes, all of our 
algorithms take less than two seconds. 

VI. Conclusion 
In this paper, we investigated cascade inference problem 
based on partial observation. We proposed the notions of 
consistent trees for capturing the inferred cascades, namely, 
bounded consistent trees and perfect consistent trees, as well 
as quantitative metrics by minimizing either the size of the 
inferred structure or maximizing the overall likelihood. We 
have established the intractability and the hardness results 
for the optimization problems as summarized in Table [III 
Despite the hardness, we developed approximation and 
heuristic algorithms for these problems, with performance 
guarantees on inference quality. We verified the effectiveness 
and efficiency of our techniques using real life and synthetic 
cascades. Our experimental results have shown that our 
methods are able to efficiently and effectively infer the 
structure of information cascades. 
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